Empirical Speedup Study of Truly Parallel Data
نویسندگان
چکیده
We present an empirical study of novel work-optimal parallel algorithms for Burrows-Wheeler compression and decompression of strings over a constant alphabet. To validate these theoretical algorithms, we implement them on the experimental XMT computing platform developed especially for supporting parallel algorithms at the University of Maryland. We show speedups of up to 25x for compression, and 13x for decompression, versus bzip2, the de facto standard implementation of Burrows-Wheeler compression. Unlike existing approaches, which assign an entire (e.g., 900KB) block to a processor that processes the block serially, our approach is “truly parallel” as it processes in parallel the entire input. Besides the theoretical interest in solving the “right” problem, the importance of data compression speed for small inputs even at great expense of quality (compressed size of data) is demonstrated by the introduction of Google’s Snappy for MapReduce. Perhaps surprisingly, we show feasibility of holding on to quality, while even beating Snappy on speed. In turn, this work adds new evidence in support of the XMT/PRAM thesis: that an XMT-like many-core hardware/software platformmay be necessary for enabling generalpurpose parallel computing. Comparison of our results to recently published work suggests 70x improvement over what current commercial parallel hardware can achieve.
منابع مشابه
Speeding up the Stress Analysis of Hollow Circular FGM Cylinders by Parallel Finite Element Method
In this article, a parallel computer program is implemented, based on Finite Element Method, to speed up the analysis of hollow circular cylinders, made from Functionally Graded Materials (FGMs). FGMs are inhomogeneous materials, which their composition gradually varies over volume. In parallel processing, an algorithm is first divided to independent tasks, which may use individual or shared da...
متن کاملA Implementation of the Karp-Zhang Parallel Branch-and-Bound Algorithm
This paper studies an implementation of the Karp-Zhang Parallel Branch-and-Bound algorithm on a shared memory machine. By employing it to solve a solitaire card puzzle, empirical data on the speedup of the algorithm is (going to be) obtained.
متن کاملA New Approach to Solve N-Queen Problem with Parallel Genetic Algorithm
Over the past few decades great efforts were made to solve uncertain hybrid optimization problems. The n-Queen problem is one of such problems that many solutions have been proposed for. The traditional methods to solve this problem are exponential in terms of runtime and are not acceptable in terms of space and memory complexity. In this study, parallel genetic algorithms are proposed to solve...
متن کاملParallel Spatial Pyramid Match Kernel Algorithm for Object Recognition using a Cluster of Computers
This paper parallelizes the spatial pyramid match kernel (SPK) implementation. SPK is one of the most usable kernel methods, along with support vector machine classifier, with high accuracy in object recognition. MATLAB parallel computing toolbox has been used to parallelize SPK. In this implementation, MATLAB Message Passing Interface (MPI) functions and features included in the toolbox help u...
متن کاملEmpirical Study of a Data ow Language on the CM
This paper presents empirical data on the behavior of large dataaow programs on a distributed memory multiprocessor. The programs, written in the dataaow language Id90, are compiled via a Threaded Abstract Machine (TAM) for the CM-5. TAM reenes dataaow execution models by addressing critical constraints that modern parallel architectures place on the compilation of general-purpose parallel prog...
متن کامل